arXiv:1503.04161v2 [math.ST] 18 Mar 2016 


Studentized IZ-quantile processes under dependence with 
applications to change-point analysis 

Daniel Vogel* and Martin Wendler 


Institute for Complex Systems & Mathematical Biology, University of Aberdeen, 

Aberdeen AB24 SUE, United Kingdom, daniel.vogeWabdn.ac.uk 

Institut fiir Mathematik Informatik, Ernst Moritz Arndt Universitat Greifswald, 

17487 Greifswald, Germany, martin.wendler@uni-greifswald.de 

Abstract 

Many popular robust estimators are ?7-quantiles, most notably the Hodges-Lehmann loca¬ 
tion estimator and the Qn scale estimator. We prove a functional central limit theorem for the 
?7-quantile process without any moment assumptions and under weak short-range dependence 
conditions. We further devise an estimator for the long-run variance and show its consistency, 
from which the convergence of the studentized version of the 17-quantile process to a standard 
Brownian motion follows. This result can be used to construct CUSUM-type change-point tests 
based on 17-quantiles, which do not rely on bootstrapping procedures. We demonstrate this 
approach in detail with the example of the Hodges-Lehmann estimator for robustly detect¬ 
ing changes in the central location. A simulation study confirms the very good efficiency and 
robustness properties of the test. Two real-life data sets are analyzed. 

keywords: CUSUM test, Hodges-Lehmann estimator, Long-run variance. Median, Near epoch 
dependence, Robustness, Weak invariance principle. 


1 Introduction 

Let Ai, ..., Xn be a (not necessarily independent) sample from some univariate distribution F. For a 
symmetric, measurable function 5 : —>■ R, the average of the ( 2 ) values g{Xi, Xj), 1 < i < j < n, 
is called a 17-statistic with kernel g. If the data are independent, this is an unbiased estimator of 
the quantity E{g{Xi, X 2 )). A prominent textbook example is the scale estimator known as Gini’s 
mean difference, which is obtained for g{x,y) = \x — y\. 

Instead of taking the average, one may also consider the sample median of g{Xi,Xj), 1 < i < 
j < n, or more generally any sample p-quantile, 0 < p < 1. Such a statistic is called a 17-quantile. 
Several estimators that have gained popularity in robust statistics are t7-quantiles. For instance, 
taking p = 1/4 and the above mentioned kernel g{x,y) = \x — y\ yields the Qn scale estimator |40| . 
Similarly, choosing the sample median and the kernel g{x, y) = {x-\-y)/2 yields the Hodges-Lehmann 
estimator of location nnun], 

hn = median {(Ai -|- Aj )/2 11 < i < j < n} . ( 1 ) 

The motivation for the present article originates in the authors’ interest in robust change-point 
detection. Let us consider for an instant the change-point-in-location problem. Specifically, if we 
let (V)i<i<n be a centered stationary sequence and assume the data (Ai)i<i<„ to follow the model 
Xi = Yi-\- yti, 1 < z < n, we want to test the hypothesis 

i7o : Pi = p2 = ■ • • = 
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against the alternative 

Hi'. 3fc e {1,... ,n - 1} : = ... == ... =/x„. 

The usual CUSUM test statistic for detecting changes in the central location can be written as 

^cs,n = max -^\Xk - Xn\, (2) 

l<k<n y/n 

where Xk denotes the mean of the first k observations. For a stationary sequence Xk, k G h, 
satisfying suitable moment and short-range dependence conditions, T^g „ converges in distribution 
to (Tcs suptg[o q \B{t)\, where 

00 

^cs = cov{Xo,Xi,) (3) 

k——oci 

is the long-run variance lim„_,,oo var(X„) of the mean, and B denotes a Brownian bridge. The 
main tool for proving the convergence of Tcs.n is an invariance principle (or functional central limit 
theorem) for the partial sum process 



which one may also view as a partial mean process. The first objective of the present paper is 
to establish a functional limit theorem under short-range dependence for the ff-quantile process, 
i.e., the process obtained from the right-hand side of Q by replacing the sample mean by a t7- 
quantile and EXi by the corresponding population value (Theorem |2.3[ ) . The second main theoretical 
contribution is to propose and establish the consistency of an estimator for the long-run variance 
term that appears in the limit process (Theorem |2.4[ ). These results can be used to devise a CUSUM- 
type change-point test for location based on the Hodges-Lehmann estimator, which is expected to 
have a much higher robustness against heavy tails than the classical CUSUM test while retaining 
essentially the same efficiency under normality, as it is known that the Hodges-Lehmann estimator 
has an asymptotic efficiency of 95% with respect to the mean at normality [e.g. |S]. Similarly, the 
classical approach to the change-in-scale detection problem is a CUSUM-type test statistic, where 
the mean is replaced by the sample variance. This goes back to Inclan and Tiao |311, and has been 
extended to broader settings by several authors [m ESI HZ]- This test suffers even more so from the 
vulnerability to outliers and heavy tails. Our results can also be used to devise an alternative test 
for changes in the variability based on the highly robust Qn scale estimator. 

The outline of the paper is as follows. The limit theorems for general [/-quantiles are given 
in Section with the proofs being deferred to the Appendix. In Section we investigate the 
application of the results to the problem of change-in-location detection by means of the Hodges- 
Lehmann estimator. In Section we analyze power and finite-sample properties of this test and 
compare it to the classical CUSUM test and a similar test based on the median by means of numerical 
simulations. The simulation results confirm that the good efficiency and robustness properties of 
the Hodges-Lehmann estimator translate into similar properties of the test. The application of the 
test is demonstrated at two data examples in Section 

2 Limit theorems for tZ-quantiles under dependence 

Let (Ai)igz be a strictly stationary sequence of random variables. The empirical p-[/-quantile can 
be written as the generalized inverse U~^{p) of the empirical [/-distribution function 
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To allow smoothed estimators of the generalized distribution function as well, we replace l{g(x,y)<t} 
by a more general function h{x,y^t). 


Definition 2.1. We call a nonnegative, bounded, measurable function /iiRxlRxlR,—J-R which is 
symmetric in the first two arguments and non-decreasing in the third argument a U-quantile kernel 
function. For fixed t S R, we call 


Unit) 


2 

n(n — 1 ) 


^ hiX,,X„t) 

l<2<j'<n 


the U-statistic with kernel and the process iUnit))f.^^ the empirical U-distribution function. 

We define the population U-distribution function as Uft) = E [h {X,Y,t)], where X, Y are inde¬ 
pendent with the same distribution as Xq. Furthermore, U~^{p) = inf{t|t/(t) > p} is called the 
p-U-quantile and U~^ip) = inf{t|?7„(t) > p} the empirical p-U-quantile. 

To study the empirical [/-distribution function, we need a functional version of the Hoeffding 
decomposition m- We write [/„(/) as 

2 ” 2 

Unit) = U{t)-\ — "S^hi iXi,t)-\ - - -— h2{Xi,Xj,t) 


where 


hi{x,t) = Ehix, XQ,t) — U{t), ( 5 ) 

h 2 {x,y,t) = h{x,y,t) - hi{x,t) - hiiy,f) - U{t). 

[/-quantiles can be analyzed using a generalized Bahadur representation. Bahadur showed that 
the empirical quantile can be approximated by a linear transform of the empirical distribution 
function. This was generalized by Geertsema [15] to [/-quantiles of independent data. The rate of 
convergence was improved by Choudhury and Serfling |S], Dehling et al. m and Arcones |3] later. 
A generalized Bahadur representation for [/-quantiles of dependent data was recently established by 
Wendler |451 ITB] . 

Concerning the serial dependence structure of the process (Ai)igz, we assume it to be near epoch 
dependent in probability (PNED) on an absolutely regular process. For two cr-fields A, R C P on the 
probability space (n,P, P), the absolute regularity coefficient = P[sup^g^ \P{A\B) — P(A)|] 

is a measure of dependence of A and B. Let be a stationary process. The absolute regularity 

coefficients of (Zi)jgz are given by 

I3k = P (cr(.. Zo),<7{Zk, Zk+i ,...)), fc e M. 

The process is called absolutely regular if /3fc —>■ 0 as fc —?► oo. We will not study absolutely 

regular processes themselves, as important classes of time-series like linear processes are not covered. 
Instead, we study processes which are near epoch dependent on absolutely regular processes. 

Definition 2.2. Let {{Xi, Zi))^^.^ be a stationary process. 

1. We say that (Ai)igz is Lp near epoch dependent, p> 1, on the process with approxi¬ 

mation constants (a;,p)ig]N z/lim;_,.oo a/_p = 0 and 

[E\Xo-E{Xo\<j{Z_i,...,Zi))f>)'^ <ai,p, Z S {0,1, 2,...}. 


2. We say that is near epoch dependent in probability (PNED) on the process 

with approximation constants (a;);g]N if ai ^ 0 as I ^ oo and there is a sequence of functions 
fi : R2'+i ^ R and a non-increasing function (j) : ( 0 ,(xi) —>■ ( 0 ,(xi) such that 

P{\Xo-MZ_i,...,Zi)\>e)<aiP{e) 
for all I G ¥1 and e > 0 . 
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Near epoch dependent processes are also called approximating functionals [e.g. |7]. This class 
of short-range dependent processes includes all time series models relevant in econometrics, like 
ARMA-processes and GARCH-processes [e.g. HU], and furthermore also covers expanding dynamical 
systems, where the sequence Xn+i = T(Xn) is deterministic apart from the initial value Xq [see e.g. 
1^. 

We prefer to use near epoch dependence in probability (PNED) instead of the usual L 2 near 
epoch dependence since it does not necessitate the existence of any moments. We consider quantile- 
based estimators, a decisive advantage of them being their moment-freeness, and we do not want to 
limit the scope of our results in this respect by implicitly introducing moment assumptions in the 
short-range dependence conditions. The concept of PNED used here was introduced by Dehling et al. 
m- Similar concepts that embody the idea of approximating in a probability sense rather 

than an Lp sense can be found under the name of 5'-mixing in Berkes et al. and under the name 
of Lg-approximability in Potscher and Prucha |391 Chapter 6[. If is near epoch dependent 

in probability on the process we can represent A„ almost surely as A„ = /oo((^n+i)ie^)' 

We will require the PNED approximation constants ai and the absolute regularity coefficients (3^ to 
fulfill certain rate conditions. 

Assumption 1. The sequence (Ai)igz is PNED on an absolutely regular sequence such 

that ai4>{l~^) = 0{l~^) as Z —?> 00 and < 00 . 

So far, the P-statistic kernel g is completely arbitrary. In proofs for weakly dependent data, the 
dependent random variables are approximated by independent random variables. In order to control 
the error induced by this approximation, we require some form of continuity condition on h with 
respect to the marginal distribution of the process. 

Assumption 2. Let 0 < p < 1 and ZiiRxIlxIil—^R&ea bounded kernel function such that for 
a constant L and for all t in a neighborhood of U~^{p) and all e > 0 


E 


sup 

(x,y)-(X.Y)\\<, 


\h{x,y,t) -h{X,YN)t 


< Le, 


where X, Y are independent with the same distribution as Xq and ||(xi,a;2)|| = {x\ -|-a;2)^^^ denotes 
the Euclidean norm. 

This condition holds for all Lipschitz continuous kernel functions h. If Lipschitz continuity does 
not hold, as it is the case for kernels of the type h(x,y,t) = ^{g(x,y}<t}t we need some regularity 
conditions on the distribution of Xq, cf. Remark [3.2| below. 

Since we consider sample quantiles, we further require that the P-distribution function P behaves 
regularly at U~^{p). Let u{t) = U'{t) denote the derivative of the P-distribution function. 

Assumption 3. Let U{t) = E[h{X,Y,f)\ be differentiable in a neighborhood ofU~^{p) G R with 
u (P“^(p)) > 0 and 

\U{t)-p-u{U~^{p)) {t-U~'^{p))\=o(^\t-U~^{p)\^^'^'j ast^-P"i(p). (6) 

We are now ready to state the first of our two main results. 

Theorem 2.3. Under Assumptions^ [^ aad[^ we have for the U-quantile process that 




5e[o,i] 


^pW 


in the Skorokhod space D[0, 1], where W is a standard Brownian motion and 
4 " 




u^{U-^{p)) 


^ cov(/ii(Ao,P ^{p)),hi{Xr,U \p))) . 


(7) 
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Unless the distribution of the whole process (Xi)igz is fully specified, the long-run variance Up 
is unknown. For statistical applications, it is therefore desirable to have an estimate of The 
estimator we propose below is obtained by replacing all unknown quantities in the right-hand side 
of ([^ by their empirical versions. We restrict our attention to the original situation where h takes 
on the form h{x, y,t) = ^{g{x,y)<t}- This allows to directly apply usual kernel density estimation to 
the t7-statistic density u. Let 


Unit) 


2 

n{n — l)dn 


E ^ 


/ g(W,X,)-A 

I dn ; ’ 


( 8 ) 


where K is a, density kernel and dn a bandwidth which fulfill the following conditions. 

Assumption 4. The function K is symmetric around 0, Lipschitz continuous with bounded support 
and bounded variation, and it integrates to 1. The bandwidth dn satisfies dn ^ 0 and nd^^ —>■ oo 
as n —> oo. 

Furthermore, we need an empirical version of hi from (|^. Let 

^ n ^ n 

hiix,t) = - 'V h{x,Xi,t) - - h{Xi,Xj,t), 


and consider the sample autocovariance of {hi{Xi^t))i<ci<n for lag r, i.e., 


^ IL — 7 

p{r,t) = - 'V hi{Xi,t) hiiXi+r,t). 
n 

2 = 1 

We estimate the infinite-sum part in 0 by a heteroscedasticity and autocorrelation consistent (HAC) 
kernel estimator, and define 


G 


2 

p,n 


4 

uliUn\p)) 


n—1 

E ^(r/bn) p{'r,Un\p)), 

r= —(n—1) 


where W and bn fulfill the following conditions. 

Assumption 5. The function W : [0, oo) — >■ [0, 1) is continuous at 0 and at all but a finite 
number of points. Furthermore, \W\ is dominated by a non-increasing, integrable function and 
/o°° l/o°° cos{xt)dt\ dx < oo. The bandwidth bn satisfies bn ^ oo and bnj\fn —t 0 as n —> oo. 

Assumption 1^ mainly coincides with Assumption 1 of de Jong and Davidson |10| . It is satisfied 
by a large class of kernels, including the Bartlett kernel W{t) = (1 — |t|)l{|q<i}- Finally, we need a 
continuity condition similar to Assumption also for the kernel g. 

Assumption 6. There is a constant L such that for all e > 0 
eI sup \g{x,y) - g{X,Y)\\ < Le, 

\ x,y: \ 

\||(a;.y)-(X.F)||<e / 


where X, Y are independent with the same distribution as Xq. 

Conditions of this type (including Assumption]^ above) are also called variation conditions and 
were first introduced by Denker and Keller m- They are mild regularity conditions which we usually 
find to be fulfilled for kernels and data distributions that are of interest for statistical applications. 
Specific conditions on the distribution F implied by Assumptions and in case of the Hodges- 
Lehmann estimator are discussed in Remark |3.21 

We have the following consistency result for the long-run variance estimator. 
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Theorem 2.4. Under Assumptions 1 to 6 we have a, 


2 

p,n 


OO. 


The following result is an immediate corollary of Theorems 2.3 and 2.4 Part (A) follows by 
Slutsky’s lemma, and part (B) by a further application of the continuous mapping theorem. 

Corollary 2.5. Under Assumptions 1 to 6, we have 

Ins] 


(A) 


^ p,n 


{u[2]ip) - u-\p)) 


w 


sG[0,l] 


in the Skorokhod space D[0,1], where W is a standard Brownian motion, and 


(B) max_ — \U^, ^{p) - U^\p)\— 


sup |B(s)|, 

0<s<l 


2<k<n s/nd'p^^ 

where B{s) = W{s) — sW{l), 0 < 5 < is a standard Brownian bridge. 


We refer to the process in Corollary 2.5 (A) as the stndentized t/-quantile process. 


3 Robust detection of changes in the central location 

We return to the question of change-point detection as outlined in the introduction. The practical 
implementation of the CUSUM test, cf. ([^, requires the estimation of the long-run variance cr^g, 
cf. Q, which is usually accomplished by a kernel estimator of the form 

n-l n-\k\ . 

<^CS.n = E WWbn)\ ^ E - ^n){X,+ \k\ - k (9) 

fc=-(n-l) i=l J 

where W and bn are as in Assumption]^ [see e.g. 0]. The CUSUM test is known to be inefficient 
under heavy tails and prone to outliers. It is interesting to note that, although outliers tend to 
increase the test statistic Tcs.n, the general effect outliers have on the test is not a size distortion, 
but rather a loss of power: the test statistic is divided by the estimate a'cs,n, which is even more 
strongly increased by outliers. An intuitive approach to a robust, less outlier-sensitive change-point 
detection is to replace the sample mean in by an alternative location estimator. We will pursue 
this approach in the following and examine the median and the Hodges-Lehmann estimator h„, 
cf. 0 . as potential alternatives. 

The problem of change-point-in-location detection is a classic one and well studied, see, e.g., the 
monograph by Csorgd and Horvath |3] . Articles considering the problem under dependence include 
among others Andrews |T], Kokoszka and Leipus [^, Horvath et al. [23] and Horvath and Steinebach 
|23| . The literature on robust analysis of the change-point problem is comparably limited. There are 
approaches, e.g., based on ranks [e.g. [23112], M-estimators [e.g.JlH] and U-statistics [e.g.[Tni[I7]. All 
of these consider independent sequences. Recently, Huskova and Marusiakova m considered robust 
change-point procedures for a-mixing sequences. See Huskova |30| for a recent overview on robust 
change-point analysis. Hpyland m and Dehling and Fried m consider two-sample tests based 
on the two-sample Hodges-Lehmann estimator for independent and dependent data, respectively, 
which may provide the basis for robust change-point tests based on the two-sample Hodges-Lehmann 
estimator. 

When replacing the mean in Q, one possibility is the median, presumably the simplest robust 
location estimator, leading to the test statistic 

k 

iMed.n = max (10) 

l<k<n Un 
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where rhk denotes the median of Xi,... ,Xk. Under the null hypothesis of no change and under 
appropriate regularity conditions (which include no moment conditions, but smoothness conditions 
on the distribution F of Xi), TMed,n converges in distribution to CTMedSupjgjQ |i?(t)| with 


(T 


2 _ 
Med — 


1 

/(m)2 


OO 

'y ] COV (l{Xo<m} I 

k——oci 


( 11 ) 


where m = F ^(1/2) denotes the median of the distribution F and / its density. This convergence 
result as well as the consistency of the long-run variance estimator 

n-l ( n-\k\ 'j 

<^Ld,n = y-r^ E E(^{^.<-U-1/2)(1 {^.+i.,<-„}-1/2) (12) 

with a suitable kernel density estimator 
1 "■ 

fn{x) =—j-'^K{{Xk - x)/dn} (13) 

k—1 

can be shown by similar techniques as Theorems |2.3| and |2.4[ However, this robustification is paid 
by a substantial loss in efficiency at normality. The median is known to possess an asymptotic 
relative efficiency of 7r/2 = 64% with respect to the mean for independent Gaussian observations. 
Hence we propose to use the Hodges-Lehmann estimator, which is also highly robust but possesses 
an asymptotic relative efficiency of S/tt = 95% with respect to the mean at normality. This leads to 
the test statistic 

k - 

iHL.n = max —^\hk-hn\. (14) 

l<k<n -y/n 


It should be noted that Hodges and Lehmann actually consider the variant = medianj (Xi + 
Xj)l2, Xk |l<*<j<n, 1<A:< n}. Since /i„ and /i„ behave very similarly and are asymptotically 
equivalent, we stick to the variant /i„, to which the {7-quantile theory applies directly. 

For stationary and short-range dependent sequences, this test statistic TnL.n converges in distri¬ 
bution to (Thl sup(g[Q \B(t)\, where i? is a Brownian bridge and 


(T 


2 _ 
HL — 


4 ^ 

^ E Ei^ixomx,)). 

^ k= — oo 


(15) 


Here, u is the density of the distribution of {X + Y)/2 for X,Y ^ F independent, h its median, 
and = P{{x + U)/2 < h) — 1/2 for Y ^ F. Implementing the long-run variance estimation 
technique for U -quantiles described in Section one obtains 


n—\k\ 


•'HL,! 


i(/l„)2 


E ^(^/M -E '*/(rt(7fi)'!/n (77) 


(16) 


k— — {n—l) 


i=l 


where is given by |8l for the kernel g{x,y) = {x + y)/2, and ipn{x) = n ^ Yl'j=ii^{{x+Xj)/ 2 <h„} ~ 
1/2). The asymptoticoehavior of the studentized test statistic THL.n/dHL.r!, is given by Corollary 


2.5 (B) and is summarized in the following corollary. 


Corollary 3.1. Let {Xi)i^^ be a stationary sequence with marginal distribution F which satisfies 
Assumption^ Let F be such that Assumptions^ and are fulfilled for the kernel h{x,y,t) = 
l{(a:+y)/2<t}. If further AssumptionsWand^are satisfied, then Thl, nld'KL,n Converges in distribution 
to sup(g[Q |i3(f)|, where B is as before a Brownian bridge. 
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Remark 3.2. It is desirable to translate Assumptions and for this kernel h into a set of easy- 
to-verify conditions on F. It is sufficient that F possesses a Lebesgue density / which satisfies the 
following three conditions: 


(A) 

(B) 

(C) 


f is cadlag on R, 
sup 

— oo<s<t<oo 

the support of / (i.e. the closure of {a;|/(a;) > 0}) is a connected set or / is symmetric around 
some point in R. 


- /(s) 


t — s 


< 00, and 


Assumption is met for all distributions F since the kernel g{x,y) = {x + y)/2 is Lipschitz contin¬ 
uous. The function / being both a density and cadlag (right-continuous and left-hand side limits) 
on R, implies that / is bounded, hence F is Lipschitz continuous, from which Assumption [^follows. 
Concerning Assumption / being cadlag also implies that / has at most countably many discon¬ 
tinuity points, which together with (|^ implies (§. Condition ([^ if fulfilled, e.g., if / possesses a 
right-hand side derivative /' everywhere, and /' is cadlag. Note that the t/-density u in this case is 
up to re-scaling the convolution of / with itself. Finally, either of the conditions of © ensures that 
the C/-density u is non-zero in a neighborhood around its median. 


4 Simulations 

We present Monte Carlo simulation results to investigate the size and power properties of the three 
tests proposed in the previous section. We have carried out simulations for several sample sizes, but 
the results presented are for n = 240 only. This sample size is large enough for the asymptotics 
to provide sensible approximations, and the picture is the same at other sample sizes as far as 
the comparison of the tests is concerned. Throughout, we use 1000 replications. We consider two 
different scenarios concerning the characteristics of the marginal distribution of the data generating 
process, 

(A) symmetric data distributions, 

(B) skewed data distributions. The set-up will be such that a change in variance occurs along with 
the change in location. 

In scenario (0, we generate data from the following general one-change-point model: 

Aj T /^Il{2> [On ]}; f 1,..., u, 

where Yi, i € Z, is a stationary sequence, g the jump height, and 6 a jump location parameter. We 
use the following three marginal distributions for the process (Fi)iGz: normal, fa, and ti. The 
distribution with parameter v > 0 has the density fv{x) = ^/vB{v/2, 1/2)(1 -\- x"^ /, where 
B is the beta function. In order to make the jump sizes better comparable among the different 
marginal distributions, we scale the distribution such that the median (of the distribution) of |Yi| 
is the same as in the normal case, i.e., we multiply the realizations by where 

Zct and denote the a-quantiles of the normal distribution and the ti, distribution, respectively. 
Concerning the serial dependence of the sequences, we consider two cases: 

(A.l) independence, i.e., the Yi, i £ Z, are i.i.d. 

(A.2) AR(1), i.e., Fj = "fvFF^{^{Zi /where the Zi fulfill the auto-regressive equation 
Zi = (j)Zi-i + Ei with Ei ~ A(0,1) i.i.d. and (f) = 0.4. Here, F^^ denotes the quantile 
function of the distribution and 4) the cdf of the standard normal distribution. Thus 
{Yi)i^z is a marginal transformation of a Gaussian AR process. It is again scaled such that 
median (F|y^ I) = Z 3/4. 






In the independence case (A.l), the values of the long-run variances are 


cr^g = var(yi) 



2 ) 


for the normal distribution, 
for the distribution {ly > 2), 


2 _/V3 


for the normal distribution, 
for the tu distribution. 


where Uy = 2 f^{x)dx. Explicit expressions are available for the convolution of a t^-density 
with itself for odd integer i/, see Nadarajah and Dey m- We obtain (7^^ = (27r/5)^ for z/ = 3 and 
(T^l = 7 r^/3 for z/ = 1. Furthermore 


(T 


2 _ 
Med — 


7r/2 


for the normal distribution, 
for the tj/ distribution. 


In the AR(1) scenario (A.2), we have 


<^cs = ( 1 + </')/(!-</>) 


for normality. As for the ti, distribution, we are not aware of an explicit expression for the moment 
correlation of a bivariate distribution characterized by a Gaussian copula and G margins. We have 
furthermore 


f +4X:r=iarcsin(^) 

+ I arcsin } 


for the normal distribution, 
for the distribution. 


f + 2 arcsin((/)'') 

{i + S arcsin((/)'=)} 


for the normal distribution, 
for the ti, distribution. 


We can thus study the behavior of the test statistics under the null and their respective long-run 
variance estimators individually. In the tables below, we distinguish three ways of dealing with the 
long-run variance. We use the 


® known values. 


marginal variance estimates, and 


(17) 


® full long-run variance estimates. 

Full long-run variance estimation adjusts for possible serial dependence, i.e., we use the estimators 
(Jcs „, „ and „ as given by (|^, (161 and (121, respectively. Marginal long-run variance 


estimation means we assume independence and only include the summand that corresponds to fc = 0 
in the sums in (§, ([T^ and ([T^. We take the following choices for bandwidths and kernels. 


K{t) = -(1 - W{t) = (1 - 


dr,. = Ir,.n &„ = 2n^/^, 


(18) 


where denotes the sample interquartile range of the data points the kernel density estimator is 
applied to. The kernel K above is known as Epanechnikov kernel, and W as quartic kernel. The 
two kernels serve different purposes, K is used for density estimation and must be scaled such that 
it integrates to 1, while W is used for autocorrelation-consistent variance estimation and must be 
scaled such that W{Q) = 1. These choices are ultimately arbitrary, but they have been shown to 
perform well in simulations over a wide range of scenarios. The results generally differ very little 
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Table 1: Test size. Rejection frequencies (%) at the asymptotic 5% significance level of the CUSUM 
test the Hodges-Lehmann test ( [l4| , and the median test ( [l0| under no change. Marginal 
distributions: normal, t^, and ti; dependence scenarios: independence and AR(1) with parameter 
(j) = 0.4; sample size n = 240; 1000 runs. Long-run variance estimation: ®, ©, cf. 0 - 



test: 

CUSUM 


Hodges-Lehmann 


median 


long-run 

variance: 

(1) ® 

@ 

® 

@ 

© 

® 

@ 

@ 

independent 

normal 

4 4 

3 

3 

3 

3 

9 

8 

8 

data 

ts 

5 3 

2 

4 

4 

2 

8 

7 

8 


ti 

1 

1 

7 

6 

5 

10 

8 

10 

AR(1) 

normal 

4 31 

3 

4 

30 

3 

8 

27 

8 

II 

o 

ts 

26 

3 

4 

30 

3 

9 

29 

10 



6 

0 

8 

34 

5 

9 

26 

8 


with respect to the choice of the kernel. We compute for each sample the test statistics Tcs,n, ^HL.n, 
2 Med,nj divide them by the square root of the corresponding long-run variance estimate and count 
how often the thus adjusted test statistic exceeds the critical value 1.358, which is the 95% quantile 
of the limiting distribution. Although based on highly robust estimators, the test statistics Thl,™ 
and TMed.n are susceptible to outliers. Problems can arise when several extreme values occur at the 
beginning of the sequence. In order to improve the robustness of the tests, we apply an ad-hoc fix 
and simply exclude the first 10 values from the sequences of successive estimates before taking the 
maximum. 

1.0 n 



0 50 100 150 200 


Figure 1: A typical trajectory of the change-point process of the median test — 

mn))k=i,...,n (black) and the analogue for the CUSUM test (gray) for n = 240 independent standard 
normal observations. 


Analysis of size. The results for the size of the tests are summarized in Table We observe the 
following: 

(1) The CUSUM test and the test based on the Hodges-Lehmann estimator (referred to as Hodges- 
Lehmann test in the following) keep the nominal size of 5% for the normal and the distribution 
under independence as well as dependence, but appear to be slightly conservative. 

(2) The median test shows a substantial size distortion in all situations, also when the test statistic 
is adjusted by the true variance. It persists also for considerably larger n. This size distortion in 
line with results reported in Shao and Zhang (2012), for which the self-normalization approach 
proposed by the authors does not provide a remedy either. This behavior may be described as 
a discretization problem in finite samples: the rhk, fc = 1 ,..., n, take on only a small number of 
distinct values. The resulting paths differ strongly from the paths of a Brownian bridge also for 
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large samples (n > 1000), and the distribution of the supremum very slowly approaches its limit. 
In principle, this discretization applies also to the Hodges-Lehmann test, but to a negligible 
extent. For n = 240, the Hodges-Lehmann estimator is the median of about 30,000 (in case of 
continuous models) distinct values. A typical trajectory of the median change-point process of 
a standard normal i.i.d. sample with estimated long-run variance is depicted in Figure Due 
to the size distortion, the median test is excluded from the detailed power considerations in the 
following. In summary, it has a power comparable to that of the Hodges-Lehmann test when 
not corrected for size, but when corrected for size, it has in all situations, a much lower efficiency. 

(3) As expected, the marginal variance estimation fails in the AR(1) case. Ignoring the serial 
dependence leads to clearly wrong results. 


Table 2: Test power under independenee. Rejection frequencies (%) at the asymptotic 5% significance 
level of the CUSUM test ([^, the Hodges-Lehmann test and the median test ( |10[ ) under 

one-jump alternatives for independent errors. Data distributions: normal, and ti; Sample size 
n = 240; 1000 runs. Long-run variance estimation: ®, ®, @, cf. ([Ti 


jump: 

test: 

long-run variance: 

location height 

CUSUM 

(1) @ 

© 

Hodges-Lehmann 
® @ @ 

normal 

1/2 

1/4 

38 

37 

29 

38 

36 

29 

data 


1/2 

94 

93 

86 

93 

92 

84 



1 

100 

100 

100 

100 

100 

100 


3/4 

1/4 

19 

19 

12 

19 

18 

11 



1/2 

75 

75 

49 

74 

71 

46 



1 

100 

100 

100 

100 

100 

98 

ta data 

1/2 

1/4 

16 

19 

14 

31 

29 

22 



1/2 

57 

63 

51 

86 

85 

75 



1 

100 

98 

96 

100 

100 

100 


3/4 

1/4 

8 

9 

6 

18 

15 

9 



1/2 

33 

38 

22 

65 

62 

39 



1 

95 

93 

81 

100 

100 

95 

ti data 

1/2 

1/4 


10 

10 

31 

25 

18 



1/2 


2 

2 

81 

74 

58 



1 


9 

6 

100 

100 

99 


3/4 

1/4 


10 

10 

18 

13 

10 



1/2 


2 

10 

60 

52 

28 



1 


3 

2 

99 

97 

73 


Analysis of power. In Table power results in the independence scenario (A.l) for several alterna¬ 
tives are given. We consider jump heights /i = 1/4,1/2,1 and jump locations 9 = 1/4,1/2,3/4. The 
results for 0 = 1/4 are similar to those for 0 = 3/4 and not reported here. We find from Table 

(1) The CUSUM test has, as expected, no power at the ti distribution. Since the second moments 
of the ti distribution are infinite, neither the CUSUM test statistic nor the long-run variance 
estimator d-^g „ converges. 

(2) The CUSUM test and the Hodges-Lehmann test perform very similarly at the normal distribu¬ 
tion, with minor advantages for the CUSUM test. The Hodges-Lehmann test is clearly more 
efficient at the distribution and has still good power at the ti distribution. 
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Table 3: Test power for AR(1) with (p = 0.4. Rejection frequencies (%) at the asymptotic 5% 
significance level of the CUSUM test the HodgeS”Lehmann test ( [l4| , and the median test ( [l0| 
under one-jump alternatives with AR(1) errors. Marginal data distributions: normal, t^, and ti; 
sample size n = 240; 1000 runs. Long-run variance estimation: ®, cf. (ITT 


jump: 

test: 

long-run variance: 

location height 

CUSUM 
® @ 

Hodges-Lehmann 
® @ 

normal 

1/2 

1/4 

17 

14 

18 

13 

data 


1/2 

57 

47 

56 

45 



1 

99 

98 

99 

98 


3/4 

1/4 

9 

6 

9 

6 



1/2 

35 

20 

34 

17 



1 

94 

77 

94 

71 

ta data 

1/2 

1/4 


7 

14 

10 



1/2 


24 

51 

37 



1 


79 

99 

92 


3/4 

1/4 


3 

8 

6 



1/2 


10 

29 

14 



1 


44 

90 

57 

ti data 

1/2 

1/4 


10 

16 

11 



1/2 


2 

46 

28 



1 


3 

97 

79 


3/4 

1/4 


10 

13 

7 



1/2 


10 

25 

12 



1 


10 

76 

29 


(3) By comparing the power of the tests with known variance and with estimated variance, we find 
that, although a change in location generally increases the variance estimate, thus decreasing 
the power of the test, this effect is rather small in case of the marginal variance estimation, 
cf. columns (D. The marginal variance estimation provides an upper bound on what might be 
possibly gained by a sophisticated, data adaptive selection of the bandwidth 

In Table power results for the AR(1) scenario (A.2) are given with the same choices of the 
parameters /r and 6 and the same marginal distributions as in Table All tests have a lower power 
in the presence of positive autocorrelations, but the conclusions concerning the rankings of the tests 
are the same as in the independent case. 

The data generating process in scenario is similar to that in scenario The data follow 
the one-change-point model 


A,: = 


Yr, 

Yi! A2, 


1 < * < \0n \, 

\0n\ + \ < i < n, 


where the Yi, i S Z, are exponentially distributed with parameter A = 1. Instead of a change 
in the central location of a symmetric distribution we consider now a change in the parameter A 
of the exponential distribution, which implies a change in the variability along with the change in 
the location. The set-up is inspired by the river Elbe discharge data example in Section which, 
as a referee has pointed out, exhibits such features. To give an impression how the tests perform 
in such a situation, we only consider independent observations and and a change in the middle of 
the observed period, i.e., Q = 1/2. The kernel and bandwidth choices for the long run variance 
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Table 4: Exponential distribution. Rejection frequencies (%) at the asymptotic 5% significance level 
of the CUSUM test the Hodges-Lehmann test ( [l4| , and the median test ( |Io| at independent 
exponentially distributed observations; The parameter A changes from 1 in the 1st half to A 2 in the 
2nd half; sample size n = 240; 1000 runs. Long-run variance estimation: ®, cf. 


test: 

long-run variance: 

mean 2nd half I/A 2 

CUSUM 
® ® 

@ 

Hodges-Lehmann 
® (D @ 

® 

median 

® 

® 

1 

4 3 

3 

5 

5 

4 

11 

8 

9 

4/5 

29 

22 


32 

27 


32 

32 

2/3 

78 

66 


77 

70 


64 

59 

1/2 

100 

98 


100 

99 


97 

92 

1/3 

100 

100 


100 

100 


100 

100 


estimation are as in scenario (A). We use as before n = 240 observations and 1000 repetitions. 
For i.i.d. sequences Yi, i G Z, of Exp(l), we have E{Yi) = 1, m = median(yi) = log(2), and 
^CS “ '^Med “ var(Yi) = 1, Furthermore, the population value of the Hodges-Lehmann estimator 
h is the solution to 2(1 -|- 2h) = and = {3 — (2/i — l)^}/(2/i)^. The empirical rejection 
probabilities of the three tests under scenario ([^ for several values of A 2 are given in Table We 
find that, as in scenario under normality, the CUSUM test and the Hodges-Lehmann test behave 
similarly, and appear to equally well detect changes in the location if a change in variance occurs at 
the same time. Here we include also power results for the median test and note that it has a similar 
power as the other tests but clearly exceeds of the nominal 5% level under the null. 


5 Data examples 

We consider two data sets, both from hydrology: the maximum annual discharge of the river Elbe 
at Dresden and the annual rainfall in Argentina. 

The first data set has recently been analyzed by Sharipov et al. |13]. It consists of the annual 
maximum discharge of the river Elbe at Dresden, Germany, in the years 1851 to 2012. The time series 
is depicted in Figure]^ There appears to be shift in the time series around the year 1900, with the 
annual maximum discharge being lower on average afterwards. Industrialization and infrastructural 
development at the end of the 19th century led to a significant discharge of industrial sewage in to 
the river Elbe upstream from Dresden, making the river less prone to freezing in winter, resulting 
in lower spring floods. The series is clearly non-normal, cf. Figure]^ (left). It exhibits a heavy 
upper tail, with three extreme floods in 1862, 1890, and 2002. Extreme events tend to dominate 
any moment based analysis such as the CUSUM test, potentially obscuring the visible change in the 
central location. Applying the CUSUM and the Hodges-Lehmann test with the choices for K, W, 
dn, and bn as in the simulations section, cf. ([Tp, we observe that both change-point processes, i.e., 
^{hk — hn))k=i,...,n and „(Xfc — which are depicted in the lower 

plot of Figure look similar and take their maxima at 1900. However, the test decision at the 5% 
significance level is different: contrary to the Hodges-Lehmann test, the CUSUM test does not reject 
the hypothesis of no change. However, with = 2n^/^ the HAC bandwidth is chosen rather large, 
while a look at the sample autocorrelations suggests that it is legitimate to treat the observations 
as independent. When excluding the autocovariances from the long-run variance estimation, both 
tests consistently reject the null hypothesis. The heavy tail renders the CUSUM test inefficient, 
making the test outcome at the 5% level sensitive to the choice of tuning parameters, whereas the 
Hodges-Lehmann test clearly detects the change, regardless of the choice of With the average 
yearly maximum discharge, the variability of the time series decreases. The simulation results of 
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Figure 2: Top: Maximum yearly discharge (in cubic meter per second) of the river Elbe at Dresden 
from 1851 until 2012 (n = 162). Bottom: Change-point processes ^{hk — ^n))fc=i,...,n 

(solid line) and ^{Xk — Xn)k=i,...,n (dashed line). 


scenario (|^ in the previous section suggest that the CUSUM as well as the Hodges-Lehmann test 
are valid in such a situation. 

The second example is the Argentina rainfall data that has previously been analyzed in a change- 
point context by Wu et al. (48] and Shao and Zhang (42|. Also in this example, there is evidence (a 
dam built from 1952 to 1962) that supports the assumption of a change in the central location. The 
series is depicted in Figure The normal quantile plot (Figure right) reveals a fair agreement 
with normality, and in fact the Hodges-Lehmann and the CUSUM test behave similarly with both 
processes attaining their minima at 1955. Both reject the null hypothesis at the 5% level for = 
cf. Figure]^ Following the analysis of Shao and Zhang [42], we also apply the median-based 
test to this data example (dotted line in Figure]^. The median test does not reject the hypothesis 
of no change. This is in apparent contradiction to the analysis by Shao and Zhang |42| . who report 
a p-value of less than 0.001. The authors apply a self-normalized version of the test, but since 
self-normalization tends to decrease the power, this is unlikely to be responsible for the different 
results. We suspect that Shao and Zhang [42] applied the median-based test in the same manner 
as the CUSUM test, restricting the location of the potential change-point to the years 1952-1962, 
making the test largely resemble a two-sample test. 

6 Summary and discussion 

We have proved a functional limit theorem for the general C/-quantile process for short-range de¬ 
pendent data. We have furthermore established the consistency of an HAC kernel estimator for 
the long-run variance. The results are formulated under very mild conditions on the data. We 
use near epoch dependence in probability (PNED) on mixing sequences to capture the short-range 
dependence, which does do not imply any moment condition. 

As an application of the theory, we examine the properties of a new change-point test for location. 
The test is of the plug-in type, obtained from the classical CUSUM test by replacing the mean by 
the Hodges-Lehmann estimator. It is demonstrated by simulations and also mediated by the two 
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Figure 3: Top: Yearly rainfall (in millimeters) in Argentina from 1884 until 1996 (n = 113). Bottom: 
Change-point processes (solid line), 

(dashed line), and - m„))fc=i,...,„ (dotted line). 


data examples that the Hodges-Lehmann test outperforms the CUSUM test at heavy-tailed data 
and significantly reduces the potential harm of gross errors, but essentially behaves as the CUSUM 
test under normality. We show that the Hodges-Lehmann estimator is clearly to be preferred over 
the median for this purpose. A drawback of the Hodges-Lehmann test is the higher computational 
cost, but this has become negligible with the use of computers. 

The problem of robust univariate location estimation is well studied with Huber |26| being one of 
the main contributions, and there are other robust estimators that might perform comparably to the 
Hodges-Lehmann estimator in this context. See, e.g. Huber and Ronchetti m Chapters 3 & 4] for an 
overview on robust location estimation. However, besides its good statistical properties, the Hodges- 
Lehmann estimator possesses an intriguing conceptual simplicity: there are no weight functions, 
trimming percentages, tuning constants, etc., to choose. Furthermore, a thorough mathematical 
analysis of robust estimators generally tends to be elaborate, and the literature on functional limit 
theorems for such estimators is rather limited. Jureckova and Sen |M1IM] are works in this direction, 
but we are not aware of any results for dependent data. 

A certain reservation towards the use of robust estimators in general stems from the strong 
focus on moment characteristics as descriptive parameters of distributions. For instance, the mean 
is widely used to describe the central location, and any alternative location measure, such as the 
median or the Hodges-Lehmann estimator, coincides with the mean only under some restrictive 
assumptions on the data distribution (e.g. symmetry). This objection against the use of robust 
estimators is of much lesser legitimacy for two-sample or change-point tests. If we consider explicitly 
the change-point model described in the introduction, where the observations before and after the 
change-point differ only by a shift, but otherwise follow the same distribution, this shift is picked 
up equally by any proper, translation equivariant location measure, and one is hence free to make 
the choice solely based on the statistical properties of the estimators. 
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Figure 4: Normal quantile plots for the River Elbe discharge data (left) and the Argentina rainfall 
data (right). 


A Proof of Theorem 12.3 

All throughout the Appendix, we use C as generic notation for a constant. Its value may change 
from line to line, but it is always independent of n and all other indices involved in the respective 
statement. Further, we write H-Hp = {E\ ■ |p)^/^ for the Lp-norm of a random variable. In Appendix 
[A| we prove Theorem |2.3| Appendixis devoted to the proof of Theorem |2.4| 

We start by gathering several important auxiliary results from the literature which are stated 
here without proof. The following weak invariance principle for [/-statistics is a variant of Theorem 
2.5 of Dehling et al. m for bounded kernels. Dehling et al. m state the invariance principle for 
unbounded kernels, assuming (2 -|- (5)-moments. The bounded case can be proved in the same way, 
so we omit the proof. 

Proposition A.l. Under Assumptions 1, 2, and 3, the U-statistic process 


(^^{Ulns]{U \p))-p)] 

V V ^ /sG[0.1] 

converges weakly in D[0,1] to aW, there W is a standard Brownian motion, and 

OO 

^ 2=4 ^ coy{hi{Xo,U-\p)),hi{Xr,U-\p))). 

r= —OO 

We will approximate [/-quantiles by [/-statistics and will make repeated use of [/-statistic results. 
Similarly to (§, we can define the Hoeffding decomposition of the kernel g, and define gi{x) = 
Eg{x,Y) — Eg{X,Y), where X, Y ^ E are i.i.d. and F is the marginal distribution of the process 
{Xi)i^z- The following lemma is the analogue of Lemma A.2 of Dehling et al. [T3] for bounded 
kernels g. 

Lemma A.2. Let (A„)„g^ he a stationary and P-near epoch dependent process on {Zn)ne'z with 
approximating constants ai and non-increasing function </>. Let further g he a hounded, symmetric 
kernel satisfying the variation condition (Assumption^. Lf there is a sequence of positive numbers 
(s/)iGiN such that ai(l){si) = 0{si), then the sequence (gi(A„))„g^ is L 2 -NED on {Zn)nez, and the 
approximation constants satisfy 01^2 = 0{sj^^). 
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Lemma A.3. Under Assum'ptions\^ and^ we have for any t € R 


max. 


III 


h2{Xi, Xj,t) 


< C2i’^k, 


^l<2<j<n 

and J2i<i<j<n h 2 {Xi, Xj,t) = O log^(n)) almost surely. 

This is Lemma B.6 of Dehling et al. m- 

Proposition A.4. Under Assitmpiions and we have Un{U~^{p)) — p = 0{^J\og log(n)/n) 
almost surely. 

Proof. We use the Hoeffding decomposition 

2 ^ 2 

C/„(C7-i(p))-p=-^/ii(W,[/-i(p)) + --- ^ h2{X,,X„U-\p))- 

For the second summand, we can use Lemma [A.3[ By Lemma A.2 the sequence (/ii(A„, U~^{p)))n^z 
is L 2 -NED, so we can use the law of the iterated logarithm as in Theorem 8 of Oodaira and Yoshihara 
1551 for first summand. □ 


To approximate the [/-quantiles by [/-statistics, we use the following generalized Bahadur rep¬ 
resentation. 


Proposition A.5. Under Assumptions^^ an(f[^ we have 

(A) sup \Unit)— U{t)— Un{U~^{p))+p\ = O and 

(B) Rn = U~^ ip) — U~^{p) - A— —— - = O almost surely as n ^ 00 . 

u(U~'^[p)) V / 

Proof. To shorten notation, we abbreviate U~^(p) by to- Keep in mind that U{to) = p. 

Part (A): We set c„ = for n = 2^“^ -|-1,..., 2^ and fc € M. Note that [/(/) and [/„(t) are 

non-decreasing, so for any m G IN and any t S [to + wc„, to + (m -|- l)c„] we have 


- Un (to) -U{t)+p\ 

< max { |[/„ (to -f mc„) - [/„ (to) -U{t)+p\, 

\Un {to + {m + l)c„) - [/„ (to) -U{to + {m + l)c„) +p\} 

< max { \Un {to + mc„) - [/„ (to) -U{t)+p\, 

\Un {to + {m + l)c„) - [/„ (to) -U{to + {m + l)c„) +p\} 

+ \U {to + {m+ l)c„) -U {to+ mcn)\ ■ 

Using this inequality for all t such that |t — toj < C^J(log k)/2^, it follows that 
sup |[/„(t) -Un{to)-U{t)+p\ 

< max |[/„ (to-I- (to-I- l)c„) - [/„ (to) - [/(to-I-TOC„)-|-p| 

|m|<C2“^^/® log k 

+ max \U {to + {m+ l)c„) -U {to + toc„)| , 

|m|<C2-''/slogfc 


17 















and by Assumption on the differentiability of the [/-distribution function: 
max \U {to+ {jn+ l)c„) -U {to + mc„)| = O (c„). 

|m|<C2“^^/® log k 

We use the Hoeffding decomposition and treat the linear part and the degenerate part separately: 


max \Un {to + {m + l)c„) - [/„ (to) -U {to + mcn) + p\ 

|m|<C2^/® log k 


< max 

|m|<C2^/® log/c 


o 2 ^ 

E hi {X^,to + mCn) - hi{Xi, to) 


2=1 


2=1 


max 

|m|<C2^/8 logfc 


/ \ ^ ^ ^2 (Aj, Xj , to “t“ TTICyi) , , ^ ) [^-2 Aj , to) 

i{n — 1) n n — 1 

^ ^ ^ ' i—1 

The functions satisfying the variation condition (Assumption form a vector space, so for hi 
the variation condition holds uniformly in some neighborhood of to- Furthermore, the sequence 
{hi{Xn,to))nez is L 2 -NED by Lemma A.2 and thus the approximation condition of Wendler |3^ 
holds. Applying Theorem 1 of Wendler m to the function g = hi, we obtain 


max 

|m|<C2^/® log k 


o ^ 2 

E hi {Xi, to + mCn) - ^1 {^i, to) 


2 = 1 


2 = 1 


< 


sup 


|i-to|<C 


2 ^ 2 
— hi (Xi, t) - hi (Xi, to) 


2 = 1 


2=1 


= 0{Cn 


almost surely, ft remains to show that 

max \Qn{to + rncn) - Qn{to)\ = O {n^Cn) 
|m|<C2^/® log k 


(19) 


almost surely, where Q„(t) = J2i<i<j<n ^2 {Xi, Xj,t). Recall that for any random variables Yi,..., Ym, 
it holds E (max,=i_,.._m ^i^d therefore 

2 


E max 


max 


< 


2'‘-i<n<2'= |m|<C2'=/8 logfc 2'^^C,, 

2 


\Qn {to + Cnm) - Qn (to)| 


24fe(2-5fc/8)2 


E( max |Q„ (to + C„m) - Q„ (to)| ) , 

^ \ 2 ''-i<n< 2 '= ) 


|m|<C2^/® log k 

where we have used that c„ = 2“®^/® for n = 2^“^ -|- 1,...,2^. The right-hand side is further 
bounded by 

2 (^2*^/8 


4 

2llfc/4 


V e[ max \Qn{to +c„m)\) < log(fc)2g^fc^ = (72-8 log fc, 

■■= \2''-i<ra<2'= / 2'-'-'^/^ 

|m|<C2''/8 1ogfc ^ ^ 

where we have applied Lemma | A.3 1 Using the Markov inequality, we conclude that 

2 


Vp n 

\ Ofc-1 


fc = l 


max 


max 


2^ ^ <n<2^ |m |<C2'^/8logfc 


\Qn (^0 “1“ Qn (^o)l ^ ^ 


< y^ \e ( n 
V 9^-1 


fe=l 


max max —lOn (/o + c„m) - (to)| 

2''-i<n<2'= |m|<C2''/« logfe 2‘^'^Cn 


< C 2 » log k < 00 , 


fc=l 
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and with the Borel-Cantelli lemma (191 follows, and hence Part (A) is proved. 

Part (B): Without loss of generality, let u{to) = 1, otherwise replace h{x,y,t) by h{x,y, 
We represent as {p — (fg)) with 


Zn (x) = {Un (■ + to) - Un (to)) ^ (x) - X = {x + Un (to)) - X - to, 
whe re {Un {■ + to) — Un {to))~^ is the inverse function of a; !->■ C/^ (a; + to) — (^)- By Proposition 


A.4 


have 


we have limsup„_,,go ±-yn(loglogn)“i {Un (to) —p) = C. By Assumption and Part (A), we 
sup _ \Zn{x)\ = sup_|f7„ (a; + to) - (to) - a;| 

\x\<C^y{log logn)/n k|<C'.y/(log logn)/n 

< sup |f7„ (a; + to) - {/(a; + to) - f/n (to) +p| 

\x\<C^/(Joglogn)/n 

+ sup |t7 (a; + to) -p - x| = O (c„). 

\x\<Cy^ (log logn)/n 

Then by Theorem 1 of Vervaat gl], |i?„| < suP|„;|<c,/ (iogiogn)/n 1^" = ^ (<=«), so Part (B) of 

Proposition |A.5| is proved. □ 

We are now ready to prove Theorem |2. 3 1 


Proof of Theorem 2.3 We write 


^/n 


{u^nsl^P) - U-\P)) 


[ns\p-Un{U ^(p)) [ns] 
^/n u{U-^{p)) ^/n 

where i?„ is as in Proposition |A.5| By Proposition |A.1[ 

'[ns] (p-Ur,{U-\p))\\ 

sG[0.1] 


R„. 


Vn V n(t7-i(p)) 


converges weakly in Z1[0,1] to aW, there W is a standard Brownian motion and the variance is given 
by 


a 2 = 


n^(C/“i(p)) 


^ cov (/ii(Ao,{7 ^{p)),hi{Xu,U ^(p))) 


k——oo 


By Proposition A.5 we have —t 0 almost surely and thus |ni?„| < Cn^/® almost surely. 

Consequently 

sup ^ maxfc|i?fc| < ^Cn®/® -)■ 0 

sG[o.i] V« vn '=<" vn 


almost surely, and Slutsky’s theorem completes the proof. 


□ 


B Proof of Theorem 12.4 

The proof of Theorem |2.4| consists of two main steps: showing the convergence of the density 
estimator u^{U~^{p)) to u'^{U~^{p)) and showing the convergence of the cumulative autocovariance 
part. The former is the content of Lemma [B.2[ The following Lemma [B.l| is an essential tool for 
the latter step. 
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w\^ 


< 


+ 


Lemma B.l. Under Assumptions\^\^\^ and^^we have 

(n-l) n-\r\ 

1 i^i ; ^n) {^i+\r\ i 5 ^o)^l (^i+|r|; 

r——(n — 1) 1 

converges to 0 in probability as n ^ oo, where we have abbreviated to = U~^{p) and tn = U~^{p). 

Proof. We first have a look at the covariance estimator for a fixed lag r. We will use the facts that 
1^1 < 1) 1^1 < 1 and that h is non-decreasing in the third argument. 

^ J2i=l (jT‘liXi,tn)hliXi+r,U~^{p)) — hi{Xi,to)hi{Xi^r,to)^ 

127=1 {jli{Xi,tn) — hi{Xi,to)^ hi{Xi+rUn) 

^ 127=1 7ll{Xi,to) {hl{Xi^r,tn) — hl{Xi.^.r,to)j 

^ 127=7 {7 127=1 HX^, X„tn) - 7 127=1 h{X,,X„to)) kiX,+r, tn) 

2 127=7 {7^ E” ,,, = 1 h{X,,,X,,,tn) - 7^ E” = 1 h{X,,,X,,,to)) k{X,^n,tn) 

^ Ei=l tli{Xi,to) (^7l2j=l^i^i+rTXj,tn) — ^ Ej=l E i 
n Ei=l tli{Xi,to) (^7^ Eji,j2 = l — T? Ejij2 = l ^(EiiE2i^o)^ 

;j 2 Ei=l E_; = l {h{Xi,Xj,tn) — h{Xi,Xj,to)) 

~tr^ E_;ij2=i (^(Ei’E2J^n) ~ h{Xj^, Xj^Uo)) 

7^ 127=7 127=1 {^{Xi+ri Xj,tn) — hiXi^n, Xj,to)) 

^7^E7us.=i (h(X,,,X,,,tn) - h(X,„X,,,to)) 

7^E7ul. = l (HX,„X,,,tn) - h(X,„X,„to)) 


< 


+ 

+ 

< 

+ 
+ 
+ 
< 4 


< 4 


;?E ;„,2 = 1 {KXn.X,,Un) - hiX,„X,,,to) - U{tn)+p) +4\U{tn)-p\ 


Firs t not e tha t the right-hand side of this chain of inequalities does not depend on r. By Propositions 
\tn — to\ = O ^-y/loglog(n)/n^ almost surely. So we can conclude with the help of 


A .4 


and 


A .5 


Proposition |A.5| that 


7 ^ E” .,2 = 1 (^(Ei > E2 , tn) - h{X,, , A,, , to) - U{tn) + p) 


< |f^n(tn) ~ Un{to) — U(tn) + p| + 


7 ^ 127=1 {h{Xj,Xj,tn) - h{Xj,Xj,to) - U{tn)+p) 


< sup \Un{t) -U{t) - Un (to) +p\+ cfn = 0 {n ®/®) 
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almost surely. From Assumption and Theorem 2.3 we conclude that |C/(t„) — p\ < C(tn — to) = 
Op{n~^/‘^), and finally arrive at 


n-l n-\r\ 

hi(Aj, h-i(Ai, to)^i(A^i+|r| j^^(1^1/^^) 

r—— (n—1) i—1 

1 " 


in probability as n —>■ oo. The proof is complete. 

Lemma B. 2 . Under Assumptions^^ [7} and[^ 

^ E m9{X..X,)-U-\p))/dr.)^u{U-\p)) 


□ 


u„ = 


n(n — l)dr, 


in prohability as n ^ oo. 

Proof. We introduce an upper kernel „ and a lower kernel A; „ by 

Ku,n{t) = sup _ K{t) and Ki^nit) = inf _ K{t), 

t'-. 

and further an upper estimate Uu,n and a lower estimate iii^n by 
2 ^ .. /g(A„A,)-C/-i(p)' 


'^u^n — 


'^Ln — 


n(n — l)d„ 


n(n — l)d„ 


E 






E Kl.r 




g(A„A,)-C/-i(p) 


Since |C/„^(p) — U ^(p)| = 0(-y/loglog(n) /n) almost surely (Propositions A.4 and A.5), we have 


almost surely ui^n ^ Un < Uu,n for all but a finite number of n. Hence it suffices to show that 
Uu,n —>■ u{U~^{p)) and —)■ u{U~^{p)) in probability as n —)■ oo. We will focus on Uu,n, as 
the proof for ui^n is analogous. Note that Un is a [/-statistic with symmetric kernel kn{x,y) = 
Ku,niig{x,y) — U~^{p))/d) depending on n. We use the Hoeffding decomposition 

Un = EkniX, Y), ki^nix) = Ekn{x, Xf) - Un, k 2 ,n{x, y) = kn{x, y) - fci,„(a;) - ki^niy) - Un, 
where A, Y are independent with the same distribution as Aq. We obtain 




2 ^ 2 

H- nj^i) H-7-^ 

n nin — 1 

i—1 ^ l<2<7<n 


( 20 ) 


We treat the three summands on the right-hand side separately. By our assumptions, K has a 
bounded support, so let K{x) = 0 for |a;| > M. Because the density u is continuous and K 
integrates to 1 , we can conclude that 

Un - u{U~^{p)) = J ■^Ku,n ^u{x)dx - u(C/“^(p)) 

= J Ku,n{x)u{xdn + U~^{p))dx - u{U~^{p+)) 

< j Ku,n{x)\u{xdn + U~'^{p)) - u{U~^{p))\dxKu,n{x)dx - 


E 2 I Mdfi “h 


logn 


sup \u{xdn + U ^{p))—u{U ^(p))| sup A(x), 


\x\<Mdr, + y/m^ 


a;GE, 
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which converges to 0 as n —>■ oo since dn —>■ 0. To prove the convergence of the second and third 
summand in the Hoeffding decomposition (20), we first gather some properties of the sequence 
Kernel K is Lipschitz continuous for some constant Li, that is \K{x) — K{y)\ < Li\x — y\, hence 
the mapping x —>■ 3 ^( 1 ) is Lipschitz continuous with constant Li/d^, and kn{x,y) = 1 ) 

satisfies the variation condition (Assumption]^ with constant L' = Cd~'^. Furthermore, kn{x,y) < 
M' = Cy, a nd E\kniX,Y)\ < C for independent X, Y and thus Ekf^ < Cj. By the proof of 
Lemma A.2 we find that is L 2 -iiear epoch dependent with approximation constants 

a\ = C/d^l~^. As in the proof of Lemma C.l of Dehling et al. [T3], we have that 

+ 2 ||fci,„(A,)||2 \\kl,n{X^+k) - < C^/3, + C^a[, 

where denotes the cr-field generated by , Zj, so we obtain by stationarity that 

/o^ \^.oo -OO 

A - ^ <-Y.\E (fci.„(Xi)fci.„(A,))| < 5]((3A)" + A) 

V ” ,=1 / ” i=i ndd 


5/2 

converges to 0 since ndn —>■ oo. So the second summand of (20) converges to 0. For the degenerate 
part, we use that k 2 ^nix,y) is a degenerate kernel bounded by Cjdn, so we can prove similarly to 
Lemma B.2 of Dehling et al. m that 

||fc2.„(A„ X,+k+2i) - fc2.„(X,3, X,+k+2i,i)\\2 < C(Vl7~e + M'af^ (e) + 
where we write Xi^i short for fi{Zi-i ^..., Zi^i)^ and can conclude that 


2 _^ 

7-7T / , {k2,n{^i^Xj) — k2^n{Xi l^Xi^k-\-2l,l)) 

nin — 1 

'' ' '\ ^ ^ 'i<Y 


l< 2 < 7 <n 

< Cn-^l^{\fML' + M') < ^ 0 

by our assumptions on Similarly (compare Lemma B.4 of Dehling et al. |Lfp we get 


2 ^ 

7 ^ . k2^[^n^XiijXji'j ^2,n (Aj /, Aj p 


l< 2 < 7 <n 


< (yMU + M') ^ 0 , 


where k 2 ^i^n is defined by the Hoeffding decomposition of fc„ with respect to the distribution of Aq,/. 
Finally, as in Lemma B.5 of Dehling et al. m, 

|Afc 23 .n(A,, 3 ,A,,,Pfe,i.n(A, 3 ./,A,, 3 )| < C{M’fp^_u 

with m = max{i( 2 ) ~*(i))*( 4 ) ~p 3 )}) where pi),..., Z( 4 ) are the ordered indices ii,i 2 ,'i 3 ,H, and 
thus 


eI k2,UX,j,Xjj)\ <Cn-^{M')H^ ^0 

\ / 

for I = . We convergence of X]i<i<j<n k 2 ,n{Xi, Yj) then follows along the lines of the 

proof of Lemma A.3 (Lemma B .6 of Dehling et al. [13]), and hence Uu,n converges to u{U~^{p)), 
and the proof is complete. □ 
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Proof of Theorem \2.4\ We can rewrite the variance estimator as 


^ ul 


—in)^l(W+|r|, in) 

. ,, ^ On Tl , 

r——{n—l) 1—1 

L 111 

-(n-l) " ^ i=l 

‘“1 1 |j.| 

y^ ~ y^ (^iii(W,io)iii(W+|r-|)io) — iii(W,io)iii(-^i+|r|)io)) —^ 


r=—(n—1) 


^ n-l ^ n-|r| 


r= —(n—1) 4=1 

^ n-l ^ n-|r| 


s ;i:( ^1 (^2: ^o)^l (-^i+lrl; ^o) ^1 (-^ij ^n)^l (-^i+|r|; ^n)^ ^ ^ 


r= —(n—1) 4=1 


By Lemma |B.2[ the density estimator Un converges to u. Hence the first summand converges to 


by Theorem 2.1 of de Jong and Davidson m and Slutsky’s theorem. The second and the third 
summand converge to 0 by Lemma C.3 of Dehling et al. m and Lemma [B.1[ respectively. □ 
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